14 research outputs found
K-Theory Of Root Stacks And Its Application To Equivariant K-Theory
We give a definition of a root stack and describe its most basic properties. Then we recall the necessary background (Abhyankar’s lemma, Chevalley-Shephard-Todd theorem, Luna’s etale slice theorem) and prove that under some conditions a quotient stack is a root stack. Then we compute G-theory and K-theory of a root stack. These results are used to formulate the theorem on equivariant algebraic K-theory of schemes
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
With the ever-growing size of pretrained models (PMs), fine-tuning them has
become more expensive and resource-hungry. As a remedy, low-rank adapters
(LoRA) keep the main pretrained weights of the model frozen and just introduce
some learnable truncated SVD modules (so-called LoRA blocks) to the model.
While LoRA blocks are parameter-efficient, they suffer from two major problems:
first, the size of these blocks is fixed and cannot be modified after training
(for example, if we need to change the rank of LoRA blocks, then we need to
re-train them from scratch); second, optimizing their rank requires an
exhaustive search and effort. In this work, we introduce a dynamic low-rank
adaptation (DyLoRA) technique to address these two problems together. Our
DyLoRA method trains LoRA blocks for a range of ranks instead of a single rank
by sorting the representation learned by the adapter module at different ranks
during training. We evaluate our solution on different natural language
understanding (GLUE benchmark) and language generation tasks (E2E, DART and
WebNLG) using different pretrained models such as RoBERTa and GPT with
different sizes. Our results show that we can train dynamic search-free models
with DyLoRA at least 4 to 7 times (depending to the task) faster than LoRA
without significantly compromising performance. Moreover, our models can
perform consistently well on a much larger range of ranks compared to LoRA.Comment: Accepted to EACL 202
Attribute Controlled Dialogue Prompting
Prompt-tuning has become an increasingly popular parameter-efficient method
for adapting large pretrained language models to downstream tasks. However,
both discrete prompting and continuous prompting assume fixed prompts for all
data samples within a task, neglecting the fact that inputs vary greatly in
some tasks such as open-domain dialogue generation. In this paper, we present a
novel, instance-specific prompt-tuning algorithm for dialogue generation.
Specifically, we generate prompts based on instance-level control code, rather
than the conversation history, to explore their impact on controlled dialogue
generation. Experiments on popular open-domain dialogue datasets, evaluated on
both automated metrics and human evaluation, demonstrate that our method is
superior to prompting baselines and comparable to fine-tuning with only 5%-6%
of total parameters.Comment: Accepted at ACL 2023 In Finding
Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization
Knowledge Distillation (KD) has been extensively used for natural language
understanding (NLU) tasks to improve a small model's (a student) generalization
by transferring the knowledge from a larger model (a teacher). Although KD
methods achieve state-of-the-art performance in numerous settings, they suffer
from several problems limiting their performance. It is shown in the literature
that the capacity gap between the teacher and the student networks can make KD
ineffective. Additionally, existing KD techniques do not mitigate the noise in
the teacher's output: modeling the noisy behaviour of the teacher can distract
the student from learning more useful features. We propose a new KD method that
addresses these problems and facilitates the training compared to previous
techniques. Inspired by continuation optimization, we design a training
procedure that optimizes the highly non-convex KD objective by starting with
the smoothed version of this objective and making it more complex as the
training proceeds. Our method (Continuation-KD) achieves state-of-the-art
performance across various compact architectures on NLU (GLUE benchmark) and
computer vision tasks (CIFAR-10 and CIFAR-100).Comment: Published at EMNLP 2022 (Findings
Mathematical Challenges in Deep Learning
Deep models are dominating the artificial intelligence (AI) industry since
the ImageNet challenge in 2012. The size of deep models is increasing ever
since, which brings new challenges to this field with applications in cell
phones, personal computers, autonomous cars, and wireless base stations. Here
we list a set of problems, ranging from training, inference, generalization
bound, and optimization with some formalism to communicate these challenges
with mathematicians, statisticians, and theoretical computer scientists. This
is a subjective view of the research questions in deep learning that benefits
the tech industry in long run